Temporal difference learning is favored for rewards, but not punishments, in simulations and human behavior

نویسندگان

  • Adam Morris
  • Fiery Cushman
چکیده

Evidence indicates that dopaminergic neurons in basal ganglia implement a form of temporal difference (TD) reinforcement learning. Yet, while phasic dopamine levels encode prediction errors of rewarding outcomes, the encoding of punishing outcomes is weaker and less precise. We posit that this asymmetry between reward and punishment reflects functional design. In order to test this hypothesis, we constructed a reinforcement learning algorithm that parameterizes TD learning separately for reward and punishment. We find that the optimal model relies on temporal difference learning for rewards alone. Moreover, this differentiated model provides a significantly better fit to human behavioral data, similarly showing TD learning for rewards more than for punishments. This may be because information about future rewards must shape an earlier sequence of choices, while information about future punishments need only bias the immediately preceding choice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short-term memory traces for action bias in human reinforcement learning.

Recent experimental and theoretical work on reinforcement learning has shed light on the neural bases of learning from rewards and punishments. One fundamental problem in reinforcement learning is the credit assignment problem, or how to properly assign credit to actions that lead to reward or punishment following a delay. Temporal difference learning solves this problem, but its efficiency can...

متن کامل

Distinct medial temporal networks encode surprise during motivation by reward versus punishment.

Adaptive motivated behavior requires predictive internal representations of the environment, and surprising events are indications for encoding new representations of the environment. The medial temporal lobe memory system, including the hippocampus and surrounding cortex, encodes surprising events and is influenced by motivational state. Because behavior reflects the goals of an individual, we...

متن کامل

The medial orbitofrontal cortex encodes a general unsigned value signal during anticipation of both appetitive and aversive events.

The medial orbitofrontal cortex (mOFC)/ventromedial prefrontal cortex (vmPFC) has been proposed to signal the expected value of rewards when learning stimuli-rewards associations. Yet, it is still unclear whether identical or distinct orbitofrontal cortex regions encode expected rewards and punishments at the time of the cue during appetitive and aversive classical conditioning. Moreover, it is...

متن کامل

Reward Processing: a Global Brain Phenomenon? 1

22 Rewards and punishments (reinforcement) powerfully shape behavior. 23 Accordingly, their neuronal representation is of significant interest, both for 24 understanding normal brain-behavior relationships and the pathophysiology of 25 disorders such as depression and addiction. A recent article by Vickery and 26 colleagues in Neuron provides evidence that the neural response to rewards and 27 ...

متن کامل

Neuro Forum Reward processing: a global brain phenomenon?

Clark AM. Reward processing: a global brain phenomenon? J Neurophysiol 109: 1–4, 2013. First published July 18, 2012; doi:10.1152/jn.00070.2012.—Rewards and punishments (reinforcement) powerfully shape behavior. Accordingly, their neuronal representation is of significant interest, both for understanding normal brain-behavior relationships and the pathophysiology of disorders such as depression...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014